PhD Chapter 1

Campaign 2016 - Analyses and regressions


This series of files compile analyses done for the specific analysis of Chapter 1, for the regional campaign of 2016.

All analyses have been done with PRIMER-e 6 and R 4.1.2.

Click on the table of contents in the left margin to assess a specific analysis.
Click on a figure to zoom it

| 🏠


We used data from subtidal ecosystems (see metadata files for more information). Only stations that have been sampled both for abiotic parameters and benthic species were included.

Selected variables for the analyses:

Abundances of Mesodesma arctatum (Marc) and Cistenides granulata (Cgra) were also considered (see IndVal and SIMPER results).

As data is missing for metal concentrations outside BSI, two Designs have been used:


1. Permutational Analyses of Covariance

Results of univariate PermANCOVAs on parameters and multivariate PermANCOVA on the whole benthic community with depth as covariate are presented in the table below. Variables have been standardized by mean and standard-deviation, and taxon densities were (log+1) transformed.

Variable Condition Region(Co) Depth Significative groups of similar regions (p > 0.05)
om S S {CPC BDA MR}
gravel All regions in the same group
sand S All regions in the same group
silt S S {BSI CPC BDA}, {BDA MR}
clay {BSI BDA MR}, {CPC MR}
S (1 mm) S {BSI CPC MR}, {CPC BDA MR}
N (1 mm) All regions in the same group
H (1 mm) s~ S {CPC BDA MR}, {BSI MR}
J (1 mm) {BSI CPC MR}, {CPC BDA MR}
ALL SPECIES (1 mm) S S

2. Similarity and characteristic species

Let’s have a look at the \(\beta\) diversity within our conditions and sites.

Results of the PERMDISP routine are shown below (mean and SE of the deviation from centroid for each group, i.e. multivariate dispersion), along with the mean Bray-Curtis dissimilarity for each group. Taxon densities were (log+1) transformed and PRIMER was used to do the PERMDISP.

Mean within-group Bray-Curtis dissimilarity for each condition or site
  Mean deviation SE of deviation Mean BC dissimilarity
HI 64.6 0.83 0.917
R 61.9 1.14 0.878
BSI 62.9 1.18 0.903
CPC 60.2 2.25 0.87
BDA 61.1 1.93 0.882
MR 58.2 2.12 0.835

No significative relationships were found for either factor by the PERMDISP (p = 0.069) or the pairwise tests.

The following analyses allowed to detect species as characteristic of each condition. We used results from PRIMER to justify further their choice.

##                       cluster indicator_value probability
## cistenides_granulata        1          0.2836       0.018
## macoma_calcarea             1          0.2326       0.002
## ennucula_tenuis             1          0.1860       0.018
## eudorellopsis_integra       1          0.1395       0.029
## mesodesma_arctatum          2          0.2342       0.007
## harmothoe_imbricata         2          0.1975       0.010
## glycera_alba                2          0.1212       0.039
## psammonyx_nobilis           2          0.1212       0.029
## 
## Sum of probabilities                 =  50.871 
## 
## Sum of Indicator Values              =  5.89 
## 
## Sum of Significant Indicator Values  =  1.52 
## 
## Number of Significant Indicators     =  8 
## 
## Significant Indicator Distribution
## 
## 1 2 
## 4 4
SIMPER results (mean Bray-Curtis between-group dissimilarity: 0.926)
  average sd ratio ava avb cumsum
echinarachnius_parma 0.0984 0.136 0.721 0.689 0.42 0.106
mesodesma_arctatum 0.07 0.129 0.542 0.605 0.0995 0.182
cistenides_granulata 0.0609 0.0948 0.643 0.176 0.565 0.248
strongylocentrotus_sp 0.0427 0.0758 0.563 0.27 0.249 0.294
nephtys_caeca 0.0425 0.0556 0.764 0.359 0.23 0.34
limecola_balthica 0.0313 0.0578 0.542 0.234 0.18 0.373
scoloplos_armiger 0.0295 0.065 0.453 0.14 0.256 0.405
macoma_calcarea 0.0274 0.0569 0.482 0 0.312 0.435
harmothoe_imbricata 0.0257 0.0583 0.44 0.217 0.0161 0.462
amphipholis_squamata 0.0238 0.0611 0.389 0.042 0.241 0.488
protomedeia_grandimana 0.0228 0.0538 0.424 0.183 0.169 0.513
psammonyx_nobilis 0.0189 0.0592 0.32 0.185 0 0.533
thyasira_sp 0.0186 0.0469 0.397 0.021 0.241 0.553
ennucula_tenuis 0.0185 0.0422 0.438 0 0.241 0.573
mya_arenaria 0.0174 0.034 0.513 0.063 0.168 0.592
ciliatocardium_ciliatum 0.014 0.045 0.312 0.0908 0.0766 0.607
goniada_maculata 0.0139 0.0354 0.391 0.021 0.173 0.622
glycera_dibranchiata 0.0134 0.043 0.31 0.021 0.0806 0.637
glycera_alba 0.0128 0.0408 0.313 0.172 0 0.65
ameritella_agilis 0.0117 0.0491 0.238 0 0.131 0.663
astarte_undata 0.0117 0.0388 0.301 0.142 0 0.676
astarte_subaequilatera 0.0106 0.0363 0.293 0.134 0 0.687
nucula_proxima 0.00992 0.0349 0.284 0 0.112 0.698
pygospio_elegans 0.00989 0.0449 0.22 0.137 0.0161 0.708
ophelia_limacina 0.00977 0.0299 0.327 0.042 0.0578 0.719
diastylis_sculpta 0.00966 0.0405 0.238 0.0488 0.0322 0.729
eudorellopsis_integra 0.00955 0.0267 0.358 0 0.153 0.74
ampharetidae_spp 0.00948 0.0277 0.342 0.0753 0.0535 0.75
yoldia_myalis 0.00913 0.0285 0.321 0.0543 0.0484 0.76
nephtys_bucera 0.00905 0.0256 0.354 0.063 0.0322 0.77
ampeliscidae_spp 0.00898 0.0253 0.354 0.063 0.0511 0.779
pontoporeia_femorata 0.00877 0.0404 0.217 0 0.132 0.789
bipalponephtys_neotena 0.00836 0.037 0.226 0 0.106 0.798
maldanidae_spp 0.00825 0.0272 0.303 0.0908 0.0322 0.807
pagurus_pubescens 0.00766 0.0231 0.331 0.0753 0.0161 0.815
polynoidae_spp 0.00756 0.0217 0.349 0.021 0.0952 0.823
ampharete_oculata 0.00725 0.0439 0.165 0.0666 0 0.831
phyllodoce_mucosa 0.00643 0.0241 0.267 0 0.106 0.838
phyllodocidae_spp 0.00629 0.0211 0.298 0.021 0.0484 0.845
phoxocephalus_holbolli 0.00621 0.0329 0.189 0 0.0827 0.851
testudinalia_testudinalis 0.00576 0.026 0.222 0.08 0 0.858
harpinia_propinqua 0.00547 0.0253 0.216 0.0753 0.0161 0.864
quasimelita_formosa 0.00486 0.0192 0.253 0 0.0739 0.869
nephtys_ciliata 0.00455 0.0213 0.214 0 0.0645 0.874
platyhelminthes 0.00429 0.0164 0.262 0 0.0484 0.878
lacuna_vincta 0.00427 0.0233 0.184 0 0.0417 0.883
cancer_irroratus 0.00405 0.0143 0.283 0.042 0.0161 0.887
nephtys_incisa 0.00399 0.0185 0.216 0.021 0.0161 0.892
arrhoges_occidentalis 0.00398 0.0167 0.239 0.0543 0 0.896

3. Regressions

3.1. Data manipulation

For the following analyses, independant variables are habitat parameters and heavy metal concentrations, dependant variables are diversity indices. Variables have been standardized by mean and standard-deviation.

3.1.1. Identification of outliers

To identify stations that are not consistent with the others, we used the multivariate Cook’s Distance (CD) on the uncorrelated variables. A significative threshold of 4 times the mean of CD has been established.

Design 1

We identified stations 60, 72, 80 and 96 as general outliers. They have been deleted for the following analyses of Design 1.

Design 2

We identified stations 108 and 110 as general outliers. They have been deleted for the following analyses of Design 2.

3.1.2. Correlations between predictors

Correlations have been calculated with Spearman’s rank coefficient.

Design 1

According to these results, the following variables are highly correlated (\(|\rho|\) > 0.80) so they have been considered together in the regressions of Design 1:

  • silt and clay (clay deleted)

We decided to keep sand, even if it is correlated with om, to stay consistant with the 2014 campaign.

Correlation coefficients between habitat parameters (Design 1)
  om gravel sand silt clay
om 1 -0.068 -0.807 0.714 0.706
gravel -0.068 1 -0.192 -0.37 -0.329
sand -0.807 -0.192 1 -0.772 -0.768
silt 0.714 -0.37 -0.772 1 0.973
clay 0.706 -0.329 -0.768 0.973 1

Design 2

According to these results, the following variables are highly correlated (\(|\rho|\) > 0.80) so they have been considered together in the regressions of Design 2:

  • cadmium and manganese (manganese deleted)
  • copper, lead and zinc (copper and zinc deleted)

We decided to keep arsenic, even though it is correlated with the copper/lead/zinc group, to stay consistant with the 2014 campaign.

Correlation coefficients between heavy metals concentrations (Design 2)
  arsenic cadmium chromium copper iron manganese mercury lead zinc
arsenic 1 0.492 0.736 0.876 0.773 0.399 0.646 0.816 0.903
cadmium 0.492 1 0.757 0.41 0.766 0.881 0.154 0.708 0.663
chromium 0.736 0.757 1 0.712 0.825 0.767 0.463 0.85 0.879
copper 0.876 0.41 0.712 1 0.633 0.38 0.572 0.829 0.89
iron 0.773 0.766 0.825 0.633 1 0.755 0.429 0.745 0.842
manganese 0.399 0.881 0.767 0.38 0.755 1 0.105 0.584 0.628
mercury 0.646 0.154 0.463 0.572 0.429 0.105 1 0.627 0.545
lead 0.816 0.708 0.85 0.829 0.745 0.584 0.627 1 0.898
zinc 0.903 0.663 0.879 0.89 0.842 0.628 0.545 0.898 1

3.2. Univariate regressions

We used linear models for the regressions on diversity indices. Outliers and correlated variables were removed from these analyses. Variables have been standardized by mean and standard-deviation (coefficients need to be back-transformed to be used in predictive models).

3.2.1. Simple regressions

These analyses have been do to explore the relationships between variables. As it is a huge number of results to interpret, only multiple regressions will be included in the article (see below).

Depth has been shown important for several parameters in the ANCOVAs, so here are the corresponding scatterplots.

Design 1
Adjusted R-squared of simple regressions for Design 1
  om gravel sand silt
S 0.09824 0.06215 0.0708 0.1258
N 0.01242 0.01491 0.03477 0.03467
H 0.09519 0.03329 0.06053 0.1134
J 0.004809 -0.0122 0.01178 0.01984
p-values of simple regressions for Design 1
  om gravel sand silt
S 0.00425 0.01962 0.01359 0.001309
N 0.1732 0.1542 0.06343 0.06371
H 0.004839 0.06765 0.02101 0.002229
J 0.2504 0.7054 0.1785 0.123
Design 2
Adjusted R-squared of simple regressions for Design 2
  arsenic cadmium chromium iron mercury lead
S -0.01268 -0.04896 -0.03331 -0.04823 -0.047 0.06622
N 0.008407 -0.04909 -0.03615 -0.04682 -0.04877 0.03425
H -0.01205 -0.03027 -0.001362 -0.02749 -0.02325 0.102
J -0.04952 -0.01768 -0.0304 -0.03285 -0.03656 -0.04851
p-values of simple regressions for Design 2
  arsenic cadmium chromium iron mercury lead
S 0.4008 0.8897 0.5762 0.8559 0.8132 0.1303
N 0.2907 0.8964 0.6107 0.8078 0.8796 0.2014
H 0.3967 0.543 0.3361 0.5155 0.478 0.08065
J 0.9251 0.4348 0.5443 0.5708 0.6162 0.8677

3.2.2. Multiple regressions

This section presents analyses done to determine which variables are the most important to explain the parameters.

We identified which variables were selected after an AIC procedure to predict the best the parameters. Results of the variable selection, according to AIC, and details of the regressions, with diagnostics and cross-validation, are summarized below.

Design 1
Variable (or combination) S N H J
om
gravel - +
sand + - +
silt/clay + - + +
Adjusted \(R^{2}\) 0.17 0.1 0.18 0.02
Richness
## FULL MODEL
## Adjusted R2 is: 0.15
Fitting linear model: S ~ om + gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.07118 0.1104 -0.6444 0.5215
om -0.03253 0.2084 -0.1561 0.8765
gravel 0.1478 0.3643 0.4057 0.6863
sand 1.23 0.9469 1.3 0.1982
silt 1.498 0.9953 1.505 0.137
## RMSE from cross-validation: 0.8980579
Variance Inflation Factors
  om gravel sand silt
VIF 2.01 2.35 8.23 9.4

## REDUCED MODEL
## Adjusted R2 is: 0.17
Fitting linear model: S ~ sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06123 0.1061 -0.5769 0.5659
sand 0.8883 0.4034 2.202 0.03102 *
silt 1.143 0.371 3.081 0.002963 * *
## RMSE from cross-validation: 0.8688591
Variance Inflation Factors
  sand silt
VIF 3.55 3.55

Density
## FULL MODEL
## Adjusted R2 is: 0.1
Fitting linear model: N ~ om + gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.08685 0.1204 0.7216 0.473
om 0.25 0.2271 1.101 0.2749
gravel -1.125 0.397 -2.833 0.006085 * *
sand -2.733 1.032 -2.649 0.01006 *
silt -2.591 1.085 -2.389 0.01974 *
## RMSE from cross-validation: 1.185244
Variance Inflation Factors
  om gravel sand silt
VIF 2.01 2.35 8.23 9.4

## REDUCED MODEL
## Adjusted R2 is: 0.1
Fitting linear model: N ~ gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.06937 0.1195 0.5805 0.5635
gravel -0.9239 0.3531 -2.616 0.01094 *
sand -2.179 0.902 -2.416 0.0184 *
silt -1.858 0.8575 -2.166 0.03379 *
## RMSE from cross-validation: 1.161344
Variance Inflation Factors
  gravel sand silt
VIF 2.09 7.18 7.42

Diversity
## FULL MODEL
## Adjusted R2 is: 0.17
Fitting linear model: H ~ om + gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.08487 0.111 -0.7647 0.4471
om -0.1244 0.2094 -0.5938 0.5546
gravel 0.5965 0.3661 1.63 0.1079
sand 2.308 0.9514 2.426 0.01798 *
silt 2.593 1 2.593 0.01168 *
## RMSE from cross-validation: 0.8715363
Variance Inflation Factors
  om gravel sand silt
VIF 2.01 2.35 8.23 9.4

## REDUCED MODEL
## Adjusted R2 is: 0.18
Fitting linear model: H ~ gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.07617 0.1095 -0.6957 0.489
gravel 0.4966 0.3235 1.535 0.1295
sand 2.032 0.8265 2.459 0.01649 *
silt 2.228 0.7857 2.836 0.006011 * *
## RMSE from cross-validation: 0.8738424
Variance Inflation Factors
  gravel sand silt
VIF 2.09 7.18 7.42

Evenness
## FULL MODEL
## Adjusted R2 is: 0
Fitting linear model: J ~ om + gravel + sand + silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.006695 0.1222 -0.05477 0.9565
om -0.1766 0.2307 -0.7655 0.4467
gravel 0.445 0.4032 1.104 0.2737
sand 1.277 1.048 1.219 0.2273
silt 1.533 1.102 1.391 0.1688
## RMSE from cross-validation: 1.022129
Variance Inflation Factors
  om gravel sand silt
VIF 2.01 2.35 8.23 9.4

## REDUCED MODEL
## Adjusted R2 is: 0.02
Fitting linear model: J ~ silt
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03868 0.1141 0.339 0.7356
silt 0.1809 0.1159 1.561 0.123
## RMSE from cross-validation: 0.9675419
Variance Inflation Factors
  silt
VIF 1

Design 2
Variable (or combination) S N H J
arsenic
cadmium/manganese
chromium - - -
iron
mercury
lead/copper/zinc + + +
Adjusted \(R^{2}\) 0.29 0.16 0.21 0
Richness
## FULL MODEL
## Adjusted R2 is: 0.23
Fitting linear model: S ~ arsenic + cadmium + chromium + iron + mercury + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2151 0.1867 1.152 0.2674
arsenic -0.09907 0.3912 -0.2532 0.8035
cadmium -0.06645 0.352 -0.1888 0.8528
chromium -1.191 0.8019 -1.486 0.1581
iron -0.456 0.5699 -0.8002 0.4361
mercury -0.2986 0.2195 -1.361 0.1937
lead 2.031 0.6547 3.103 0.007277 * *
## RMSE from cross-validation: 1.015118
Variance Inflation Factors
  arsenic cadmium chromium iron mercury lead
VIF 2.19 1.86 3.63 2.85 1.21 3.25

## REDUCED MODEL
## Adjusted R2 is: 0.29
Fitting linear model: S ~ chromium + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1818 0.1772 1.026 0.3177
chromium -1.538 0.5749 -2.675 0.01499 *
lead 1.655 0.5249 3.153 0.005237 * *
## RMSE from cross-validation: 0.8191663
Variance Inflation Factors
  chromium lead
VIF 2.7 2.7

Density
## FULL MODEL
## Adjusted R2 is: 0.04
Fitting linear model: N ~ arsenic + cadmium + chromium + iron + mercury + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1452 0.2174 0.6681 0.5142
arsenic 0.2008 0.4553 0.4409 0.6656
cadmium 0.05286 0.4098 0.129 0.8991
chromium -1.021 0.9333 -1.094 0.2912
iron -0.5391 0.6633 -0.8128 0.429
mercury -0.2339 0.2554 -0.9155 0.3744
lead 1.531 0.762 2.009 0.06288
## RMSE from cross-validation: 1.361962
Variance Inflation Factors
  arsenic cadmium chromium iron mercury lead
VIF 2.19 1.86 3.63 2.85 1.21 3.25

## REDUCED MODEL
## Adjusted R2 is: 0.16
Fitting linear model: N ~ chromium + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1392 0.1995 0.698 0.4936
chromium -1.293 0.6472 -1.998 0.06024
lead 1.407 0.5909 2.381 0.0279 *
## RMSE from cross-validation: 0.9483052
Variance Inflation Factors
  chromium lead
VIF 2.7 2.7

Diversity
## FULL MODEL
## Adjusted R2 is: 0.06
Fitting linear model: H ~ arsenic + cadmium + chromium + iron + mercury + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2134 0.205 1.041 0.3143
arsenic -0.2515 0.4294 -0.5857 0.5668
cadmium -0.04641 0.3864 -0.1201 0.906
chromium -0.9912 0.8802 -1.126 0.2778
iron -0.1605 0.6255 -0.2566 0.801
mercury -0.1535 0.2409 -0.6373 0.5335
lead 1.71 0.7186 2.379 0.03107 *
## RMSE from cross-validation: 0.8743848
Variance Inflation Factors
  arsenic cadmium chromium iron mercury lead
VIF 2.19 1.86 3.63 2.85 1.21 3.25

## REDUCED MODEL
## Adjusted R2 is: 0.21
Fitting linear model: H ~ chromium + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1817 0.1848 0.9832 0.3379
chromium -1.15 0.5997 -1.918 0.07024
lead 1.373 0.5476 2.508 0.02137 *
## RMSE from cross-validation: 0.8518271
Variance Inflation Factors
  chromium lead
VIF 2.7 2.7

Evenness
## FULL MODEL
## Adjusted R2 is: -0.23
Fitting linear model: J ~ arsenic + cadmium + chromium + iron + mercury + lead
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06755 0.2492 -0.271 0.7901
arsenic -0.1468 0.5221 -0.2811 0.7825
cadmium 0.1053 0.4698 0.2242 0.8256
chromium 0.7817 1.07 0.7304 0.4764
iron 0.1713 0.7605 0.2252 0.8248
mercury 0.2131 0.2929 0.7275 0.4781
lead -0.8121 0.8737 -0.9295 0.3674
## RMSE from cross-validation: 1.327671
Variance Inflation Factors
  arsenic cadmium chromium iron mercury lead
VIF 2.19 1.86 3.63 2.85 1.21 3.25

## REDUCED MODEL
## Adjusted R2 is: 0
Fitting linear model: J ~ 1
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04179 0.2191 -0.1908 0.8506
## RMSE from cross-validation: 1.05646

Quitting from lines 415-417 (C1_analyses_16B.Rmd) Erreur dans Qr$qr[p1, p1, drop = FALSE] : indice hors limites De plus : Il y a eu 26 avis (utilisez warnings() pour les visionner)

3.3. Multivariate regressions

Independant variables are habitat parameters or heavy metal concentrations, dependant variables are species abundances. Variables have been standardized by mean and standard-deviation, and outliers and correlated variables have been excluded. Taxon densities were (log+1) transformed.

This analysis has been done on PRIMER, with a DistLM to identify the variables that explain the most the community variability and with a dbRDA to plot the results.

Design 1

Variables selected by the DistLM procedure have a \(R^{2}\) of 0.08.

Design 2

Variables selected by the DistLM procedure have a \(R^{2}\) of 0.27.


🔝

Taxon densities were (log+1) transformed.